fix scan iter command issued to different replicas #3220

agnesnatasya · 2024-04-30T13:56:16Z

Pull Request check-list

Do tests and lints pass with this change?
- lint passes
- test passes on Ubuntu 22.04, python3.11.2

Starting Redis tests
= 2352 passed, 1235 skipped, 849 deselected, 29 xpassed, 347 warnings in 163.65s (0:02:43) =
Waiting for 6 cluster nodes to become available
All nodes are available!
= 1571 passed, 1494 skipped, 1396 deselected, 4 xpassed, 244 warnings in 728.56s (0:12:08) =

Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)? all succeeded in my fork.
Is the new or changed code fully tested? added tests coverage
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)? bugfix only
- I added some docstring to redis.asyncio.sentinel module. I wanted to check if the rST syntax is correct, but this module is not included in the builddir's index, so I think it's no-op.
Is there an example added to the examples folder (if applicable)? bugfix only
Was the change added to CHANGES file? bugfix only

fix scan iter command issued to different replicas

Fixes #3197. See linked issue for full description of the bug

agnesnatasya · 2024-05-29T09:20:50Z

Some of the CI tests fails flakily with a segmentation fault - if i rerun the CI in my fork, it would work most of the time. But I don't have permission to rerun it in the main repository.
Do you know whether this is expected? @gerzse
Thank you!!

gerzse

Thanks, this is a valid point. I left some comments in the changes, and I have one more general comment: the implementation covers now the async code. Did you leave out the sync code intentionally, or would it make sense to also adapt that one?

gerzse · 2024-06-14T06:42:50Z

redis/asyncio/sentinel.py

+        else:
+            # Check from the available connections, if any of the connection
+            # is connected to the host and port that we want
+            for available_connection in self._available_connections.copy():


What if there are many connections? This linear search might slow down things.

Great point! I decided to create a new data structure for available connections for SentinelConnectionPool, called ConnectionsIndexer. I index the connection based on its address.
The other connection pool's data structure will still simply be a list

gerzse · 2024-06-14T06:44:04Z

redis/asyncio/sentinel.py

@@ -122,6 +147,7 @@ def __init__(self, service_name, sentinel_manager, **kwargs):
        self.sentinel_manager = sentinel_manager
        self.master_address = None
        self.slave_rr_counter = None
+        self._request_id_to_replica_address = {}


Would it make sense to use a more general name, e.g. instead of request_id use context_id? Would it express better the fact that you are basically trying to run several commands in the same context?

I don't have a strong opinion but I think request_id conveys a pretty good semantic. context sounds broad, and I'm afraid this will be extended for other 'context storing' purposes, although this really just holds for iter requests.
I can rename it self._iter_req_id_to_.... instead, but let me know if you prefer other names!

gerzse · 2024-06-14T06:44:51Z

redis/asyncio/sentinel.py

+            await self.release(connection)
+            raise
+        # Store the connection to the dictionary
+        self._request_id_to_replica_address[iter_req_id] = (


When would these entries be removed from the dict? So that it does not grow indefinitely.

Yeah good point, I created a separate cleanup method that will be called at the end of scan iter family commands

gerzse · 2024-06-14T06:49:06Z

Some of the CI tests fails flakily with a segmentation fault - if i rerun the CI in my fork, it would work most of the time. But I don't have permission to rerun it in the main repository. Do you know whether this is expected? @gerzse Thank you!!

Those segmentation faults have been there for a long time, to be honest I have no idea why they happen. They are so annoying. Eventually I'll spend some time trying to dig deeper.

agnesnatasya

Thank you for reviewing this @gerzse, appreciate it! Great points, I've addressed your comments. This is also a bug in the sync code, I've added the fix to the sync code as well!

agnesnatasya · 2024-06-22T06:36:52Z

redis/asyncio/sentinel.py

@@ -122,6 +147,7 @@ def __init__(self, service_name, sentinel_manager, **kwargs):
        self.sentinel_manager = sentinel_manager
        self.master_address = None
        self.slave_rr_counter = None
+        self._request_id_to_replica_address = {}


I don't have a strong opinion but I think request_id conveys a pretty good semantic. context sounds broad, and I'm afraid this will be extended for other 'context storing' purposes, although this really just holds for iter requests.
I can rename it self._iter_req_id_to_.... instead, but let me know if you prefer other names!

agnesnatasya · 2024-06-22T06:45:53Z

redis/asyncio/sentinel.py

+        else:
+            # Check from the available connections, if any of the connection
+            # is connected to the host and port that we want
+            for available_connection in self._available_connections.copy():


Great point! I decided to create a new data structure for available connections for SentinelConnectionPool, called ConnectionsIndexer. I index the connection based on its address.
The other connection pool's data structure will still simply be a list

agnesnatasya · 2024-07-20T10:15:11Z

redis/asyncio/sentinel.py

+            await self.release(connection)
+            raise
+        # Store the connection to the dictionary
+        self._request_id_to_replica_address[iter_req_id] = (


Yeah good point, I created a separate cleanup method that will be called at the end of scan iter family commands

agnesnatasya

Sorry it took awhile for me to address your comments @gerzse, it had more changes than I expected, but this version should be ready for another round of review! Thank you very much!
I ran the CI workflow on my fork, it's passing except the tests that I believe is being fixed in #3324.

agnesnatasya · 2024-07-20T10:21:53Z

redis/connection.py

-            # ensure this connection is connected to Redis
-            connection.connect()
-            # if client caching is not enabled connections that the pool
-            # provides should be ready to send a command.
-            # if not, the connection was either returned to the
-            # pool before all data has been read or the socket has been
-            # closed. either way, reconnect and verify everything is good.
-            # (if caching enabled the connection will not always be ready
-            # to send a command because it may contain invalidation messages)
-            try:
-                if connection.can_read() and connection.client_cache is None:
-                    raise ConnectionError("Connection has data")
-            except (ConnectionError, OSError):
-                connection.disconnect()
-                connection.connect()
-                if connection.can_read():
-                    raise ConnectionError("Connection not ready")


I refactored this to ensure_connection so that the subclass can just directly call this. There's also a similar block of code in BlockingConnectionPool, except that it doesn't have the and connection.client_cache is None clause. But it looks to me like there's nothing special with client_cache in BlockingConnectionPool
Should we refactor that out as well, or is the absence of that clause an intentional distinction for BlockingConnectionPool?

agnesnatasya · 2024-07-20T10:24:03Z

tests/test_sentinel_managed_connection.py

+from redis.sentinel import Sentinel, SentinelConnectionPool, SentinelManagedConnection
+from redis.utils import HIREDIS_AVAILABLE
+
+pytestmark = pytest.mark.skipif(HIREDIS_AVAILABLE, reason="PythonParser only")


I marked these test to ignore Hiredis. I'm not very familiar with Hiredis setup, but it looks like the hiredis' code paths doesn't really work well when we call connection-related operations like can_read (and as far as I understand it is unrelated to my change). Let me know if i need to not skip Hiredis

agnesnatasya force-pushed the scan-iter-bug branch from 11253ba to 241033d Compare May 10, 2024 15:17

gerzse reviewed Jun 14, 2024

View reviewed changes

agnesnatasya requested a review from dmaier-redislabs as a code owner July 20, 2024 10:09

agnesnatasya force-pushed the scan-iter-bug branch from f7487fb to bfc855f Compare July 20, 2024 10:11

agnesnatasya added 24 commits July 20, 2024 18:12

fix scan iter command issued to different replicas

9fb0376

add tests

48034de

reorder

824eada

remove ignore

8f2ec5f

lint and format

ca9bd63

better inline

076e592

backward compatible typing

224f253

test inline docs

8081e03

add tests for all scan iter family

ed2b539

lint

5d7800e

implement in sync client

b964b9b

more features for ConnectionsINdexer

e5c74ac

add _same_addres methods for sentinels

83dc599

fix connect_to args

546b443

fix tests

803b291

add self

c770804

convert ConnectionsIndexer to list before indexing

6add0ef

convert ConnectionsIndexer to list before indexing

7d6c2a4

fix typo

9b52903

fix

e6375b2

fix connect_to_address

d7f8e90

cleanup in sync client

d1c25b9

rename kwargs to no underscore for consistency

b8aeb71

add cleanup tests for pipeline

ca7d0ff

agnesnatasya added 20 commits July 20, 2024 18:12

remove test for pipeline

b21f1ea

lints

24b3072

reformat

1d29914

def cleanup in base class

a4e20ac

fix some tests

12e6495

rename iter_req_id properly

a7ffda8

fix tests

8a1bc06

set fix address as a property of SentinelManagedConnection

f504741

lint

c75db8d

make mock class have same behavior as actual class

0f5e079

define _connect_to_sentinel in async server

ef0d969

mock can_read_destructive for parser

bb69c85

skip test sentinel managed connection if hirediswq

a4601a7

undo ensure_connection deduplication in BlockingConnectionPool

88501a0

import HIREDIS

c75262b

polymorphism for reset available connections instead

cdbe957

merge

ac20752

lint

9405695

fix inline comments + rename

a1ece92

lint

eede649

agnesnatasya force-pushed the scan-iter-bug branch from bfc855f to eede649 Compare July 20, 2024 10:12

agnesnatasya commented Jul 20, 2024

View reviewed changes

agnesnatasya requested a review from gerzse July 20, 2024 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix scan iter command issued to different replicas #3220

fix scan iter command issued to different replicas #3220

agnesnatasya commented Apr 30, 2024 •

edited

Loading

agnesnatasya commented May 29, 2024

gerzse left a comment

gerzse Jun 14, 2024

agnesnatasya Jun 22, 2024

gerzse Jun 14, 2024

agnesnatasya Jun 22, 2024

gerzse Jun 14, 2024

agnesnatasya Jul 20, 2024

gerzse commented Jun 14, 2024

agnesnatasya left a comment

agnesnatasya Jun 22, 2024

agnesnatasya Jun 22, 2024

agnesnatasya Jul 20, 2024

agnesnatasya left a comment

agnesnatasya Jul 20, 2024

agnesnatasya Jul 20, 2024

fix scan iter command issued to different replicas #3220

Are you sure you want to change the base?

fix scan iter command issued to different replicas #3220

Conversation

agnesnatasya commented Apr 30, 2024 • edited Loading

Pull Request check-list

fix scan iter command issued to different replicas

agnesnatasya commented May 29, 2024

gerzse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerzse commented Jun 14, 2024

agnesnatasya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnesnatasya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnesnatasya commented Apr 30, 2024 •

edited

Loading